NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hopps: Leveraging Sparsity to Accelerate Automata Processing

https://doi.org/10.1145/3676642.3736126

Du, Xingran; Emer, Joel S; Sanchez, Daniel (August 2025, ACM)

Free, publicly-accessible full text available August 6, 2026
Toward a Universal Cryptographic Accelerator

https://doi.org/10.1109/MC.2024.3481808

Devadas, Srinivas; Sanchez, Daniel (January 2025, Computer)

Full Text Available
From Auto-Kirigami to Drumheads: Suspended, Self-Tearing, and Strained Graphene Nanostructures Formed by Nanoindentation

https://doi.org/10.1021/acs.langmuir.4c03914

Sanchez, Daniel A; Yuan, Li; Leventhal, Anna; Fulco, Sage; Cross, Graham_L W; Carpick, Robert W (February 2025, Langmuir)

Free, publicly-accessible full text available February 25, 2026
Accelerating Zero-Knowledge Proofs Through Hardware-Algorithm Co-Design

https://doi.org/10.1109/MICRO61859.2024.00035

Samardzic, Nikola; Langowski, Simon; Devadas, Srinivas; Sanchez, Daniel (November 2024, IEEE)

Full Text Available
Terminus: A Programmable Accelerator for Read and Update Operations on Sparse Data Structures

https://doi.org/10.1109/MICRO61859.2024.00092

Lee, Hyun Ryong; Sanchez, Daniel (November 2024, IEEE)

Sparse data structures like hash tables, trees, or compressed tensors are ubiquitous, but operations on these structures are expensive and inefficient on current systems. Prior work has proposed hardware acceleration for these operations, but these techniques have two key shortcomings: they limit the types of data structures they support, and they focus on reads but do not support fine-grained updates to these structures. We present Terminus, a programmable accelerator for read and update operations on sparse data structures. Terminus extends each general-purpose core with a programmable dataflow engine capable of accelerating a wide range of structures and operations. Terminus engines are flexible yet simple, as they focus on common operations and defer rare, complex ones to cores. Terminus features a simple concurrency control mechanism based on address ranges that enables safe updates while preserving parallelism. We evaluate Terminus on serial and parallel benchmarks on a wide range of sparse data structures. Terminus improves performance by gmean 7.4x over a CPU baseline, showing that Terminus can accelerate fine-grained reads and writes that were previously not possible in prior accelerators for sparse structures.
more » « less
Full Text Available
Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications

Yang, Yifan; Emer, Joel S; Sanchez, Daniel (July 2024, IEEE)

Full Text Available
Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory

https://doi.org/10.1109/MICRO61859.2024.00054

Feldmann, Axel; Golden, Courtney; Yang, Yifan; Emer, Joel S; Sanchez, Daniel (November 2024, IEEE)

Full Text Available
A Tensor Compiler with Automatic Data Packing for Simple and Efficient Fully Homomorphic Encryption

https://doi.org/10.1145/3656382

Krastev, Aleksandar; Samardzic, Nikola; Langowski, Simon; Devadas, Srinivas; Sanchez, Daniel (June 2024, Proceedings of the ACM on Programming Languages)

Fully Homomorphic Encryption (FHE) enables computing on encrypted data, letting clients securely offload computation to untrusted servers. While enticing, FHE has two key challenges that limit its applicability: it has high performance overheads (10,000× over unencrypted computation) and it is extremely hard to program. Recent hardware accelerators and algorithmic improvements have reduced FHE’s overheads and enabled large applications to run under FHE. These large applications exacerbate FHE’s programmability challenges. Writing FHE programs directly is hard because FHE schemes expose a restrictive, low-level interface that prevents abstraction and composition. Specifically, FHE requires packing encrypted data into large vectors (tens of thousands of elements long), FHE provides limited operations on these vectors, and values have noise that grows with each operation, which creates unintuitive performance tradeoffs. As a result, translating large applications, like neural networks, into efficient FHE circuits takes substantial tedious work. We address FHE’s programmability challenges with the Fhelipe FHE compiler. Fhelipe exposes a simple, numpy-styletensorprogramming interface, and compiles high-level tensor programs into efficient FHE circuits. Fhelipe’s key contribution isautomatic data packing, which chooses data layouts for tensors and packs them into ciphertexts to maximize performance. Our novel framework considers a wide range of layouts and optimizes them analytically. This lets compile large FHE programs efficiently, unlike prior FHE compilers, which either use inefficient layouts or do not scale beyond tiny programs. We evaluate on both a state-of-the-art FHE accelerator and a CPU. is the first compiler that matches or exceeds the performance of large hand-optimized FHE applications, like deep neural networks, and outperforms a state-of-the-art FHE compiler by gmean 18.5. At the same time, dramatically simplifies programming, reducing code size by 10–48.
more » « less
Full Text Available
Spatula: A Hardware Accelerator for Sparse Matrix Factorization

Feldmann, Axel; Sanchez, Daniel (October 2023, Proceedings of the 56th annual IEEE/ACM international symposium on Microarchitecture (MICRO-56))

Full Text Available
Phloem: Automatic Acceleration of Irregular Applications with Fine-Grain Pipeline Parallelism

https://doi.org/10.1109/HPCA56546.2023.10071026

Nguyen, Quan M.; Sanchez, Daniel (February 2023, Proceedings of the 29th international symposium on High Performance Computer Architecture (HPCA-29))

Full Text Available

« Prev Next »

Search for: All records